{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[source](../../api/alibi_detect.od.ae.rst)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Auto-Encoder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "The Auto-Encoder (AE) outlier detector is first trained on a batch of unlabeled, but normal (inlier) data. Unsupervised training is desireable since labeled data is often scarce. The AE detector tries to reconstruct the input it receives. If the input data cannot be reconstructed well, the reconstruction error is high and the data can be flagged as an outlier. The reconstruction error is  measured as the mean squared error (MSE) between the input and the reconstructed instance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage\n",
    "\n",
    "### Initialize\n",
    "\n",
    "Parameters:\n",
    "\n",
    "* `threshold`: threshold value above which the instance is flagged as an outlier.\n",
    "\n",
    "* `encoder_net`: `tf.keras.Sequential` instance containing the encoder network. Example:\n",
    "\n",
    "```python\n",
    "encoder_net = tf.keras.Sequential(\n",
    "  [\n",
    "      InputLayer(input_shape=(32, 32, 3)),\n",
    "      Conv2D(64, 4, strides=2, padding='same', activation=tf.nn.relu),\n",
    "      Conv2D(128, 4, strides=2, padding='same', activation=tf.nn.relu),\n",
    "      Conv2D(512, 4, strides=2, padding='same', activation=tf.nn.relu)\n",
    "  ])\n",
    "```\n",
    "\n",
    "* `decoder_net`: `tf.keras.Sequential` instance containing the decoder network. Example:\n",
    "\n",
    "```python\n",
    "decoder_net = tf.keras.Sequential(\n",
    "  [\n",
    "      InputLayer(input_shape=(1024,)),\n",
    "      Dense(4*4*128),\n",
    "      Reshape(target_shape=(4, 4, 128)),\n",
    "      Conv2DTranspose(256, 4, strides=2, padding='same', activation=tf.nn.relu),\n",
    "      Conv2DTranspose(64, 4, strides=2, padding='same', activation=tf.nn.relu),\n",
    "      Conv2DTranspose(3, 4, strides=2, padding='same', activation='sigmoid')\n",
    "  ])\n",
    "```\n",
    "\n",
    "* `ae`: instead of using a separate encoder and decoder, the AE can also be passed as a `tf.keras.Model`.\n",
    "\n",
    "* `data_type`: can specify data type added to metadata. E.g. *'tabular'* or *'image'*.\n",
    "\n",
    "Initialized outlier detector example:\n",
    "\n",
    "```python\n",
    "from alibi_detect.od import OutlierAE\n",
    "\n",
    "od = OutlierAE(threshold=0.1,\n",
    "               encoder_net=encoder_net,\n",
    "               decoder_net=decoder_net)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fit\n",
    "\n",
    "We then need to train the outlier detector. The following parameters can be specified:\n",
    "\n",
    "* `X`: training batch as a numpy array of preferably normal data.\n",
    "\n",
    "* `loss_fn`: loss function used for training. Defaults to the Mean Squared Error loss.\n",
    "\n",
    "* `optimizer`: optimizer used for training. Defaults to [Adam](https://arxiv.org/abs/1412.6980) with learning rate 1e-3.\n",
    "\n",
    "* `epochs`: number of training epochs.\n",
    "\n",
    "* `batch_size`: batch size used during training.\n",
    "\n",
    "* `verbose`: boolean whether to print training progress.\n",
    "\n",
    "* `log_metric`: additional metrics whose progress will be displayed if verbose equals True.\n",
    "\n",
    "\n",
    "```python\n",
    "od.fit(X_train, epochs=50)\n",
    "```\n",
    "\n",
    "It is often hard to find a good threshold value. If we have a batch of normal and outlier data and we know approximately the percentage of normal data in the batch, we can infer a suitable threshold:\n",
    "\n",
    "```python\n",
    "od.infer_threshold(X, threshold_perc=95)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Detect\n",
    "\n",
    "We detect outliers by simply calling `predict` on a batch of instances `X`. Detection can be customized via the following parameters:\n",
    "\n",
    "* `outlier_type`: either *'instance'* or *'feature'*. If the outlier type equals *'instance'*, the outlier score at the instance level will be used to classify the instance as an outlier or not. If *'feature'* is selected, outlier detection happens at the feature level (e.g. by pixel in images).\n",
    "\n",
    "* `outlier_perc`: percentage of the sorted (descending) feature level outlier scores. We might for instance want to flag an image as an outlier if at least 20% of the pixel values are on average above the threshold. In this case, we set `outlier_perc` to 20. The default value is 100 (using all the features).\n",
    "\n",
    "* `return_feature_score`: boolean whether to return the feature level outlier scores.\n",
    "\n",
    "* `return_instance_score`: boolean whether to return the instance level outlier scores.\n",
    "\n",
    "The prediction takes the form of a dictionary with `meta` and `data` keys. `meta` contains the detector's metadata while `data` is also a dictionary which contains the actual predictions stored in the following keys:\n",
    "\n",
    "* `is_outlier`: boolean whether instances or features are above the threshold and therefore outliers. If `outlier_type` equals *'instance'*, then the array is of shape *(batch size,)*. If it equals *'feature'*, then the array is of shape *(batch size, instance shape)*.\n",
    "\n",
    "* `feature_score`: contains feature level scores if `return_feature_score` equals True.\n",
    "\n",
    "* `instance_score`: contains instance level scores if `return_instance_score` equals True.\n",
    "\n",
    "\n",
    "```python\n",
    "preds = od.predict(X,\n",
    "                   outlier_type='instance',\n",
    "                   outlier_perc=75,\n",
    "                   return_feature_score=True,\n",
    "                   return_instance_score=True)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples\n",
    "\n",
    "### Image\n",
    "\n",
    "[Outlier detection on CIFAR10](../../examples/od_ae_cifar10.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
